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Abstract. Model checking, initially successful in the field of hardware 
design, has recently been applied to software. One of the chief advantages 
of model checking is the production of counterexamples demonstrating 
that a system does not satisfy a specification. However, it may require a 
great deal of human effort to extract the essence of an error from even a 
detailed source-level trace of a failing run. We use an automated method 
for finding multiple versions of an error (and similar executions that do 
not produce an error), and analyze these executions to produce a more 
succinct description of the key elements of the error. The description 
produced includes identification of portions of the source code crucial 
to distinguishing failing and succeeding runs, differences in invariants 
between failing and non-failing runs, and information on the necessary 
changes in scheduling and environmental actions needed to cause suc- 
cessful runs to fail. In addition, this analysis allows a classification of 
errors by features such as whether they are purely concurrent (i.e. can 
be induced by changing only thread scheduling). 


1 Introduction 

In model checking [4], algorithms are used to systematically determine whether 
a system satisfies a specification. One of the major advantages of model check- 
ing in comparison to such methods as theorem proving is the production of 
a counterexample that provides a detailed example of how the system violates 
the specification when verification fails. However, even a detailed trace of how 
a system violates a specification may not provide enough information to easily 
understand (much less remedy) the problem with the system. Indeed, when the 
model of the system is in any sense abstracted from the real implementation, 
simply determining whether an error is indeed a fault in the system or merely 
a consequence of modeling assumptions or incorrect specification can be quite 
difficult. 

We attempt to extract more information from a single counterexample pro- 
duced by model checking in order to facilitate understanding of errors in a sys- 
tem (or problems with the specification of a system). We focus in this work 
on finite executions demonstrating violation of safety properties (e.g. assertion 
violations, uncaught exceptions, and deadlocks) but believe it can be extended 



to other types of counterexamples. The key to this approach is to first define 
(and then find) multiple variations on a single counterexample (other versions 
of the “same” error). From this definition naturally arises that of a set of exe- 
cutions that are variations in which the error does not occur. We call the first 
set of executions negatives and the second set positives. Analysis of the com- 
mon features of negatives and the differences between positives and negatives 
may yield a more succinct and useful feedback than reading (only) the original 
counterexample . 

One approach to analysis would be to define the negatives as all executions 
that reach a particular error state (all deadlocks, all assertion violations, etc.). 
This definition has major drawbacks. A complex concurrent program, for exam- 
ple, may have many deadlocks that have different causes. Attempts to extract 
any common features from the negatives are likely to fail or be computationally 
expensive (for example, requiring clustering) in this case. The second problem 
is that positives would presumably be any executions not ending in the error 
state, again making comparison difficult. In software, at least, we usually think 
of errors as occurring at a particular place — e.g., a deadlock at a particular 
synchronization, or a failure of a particular assertion or array-out-of-bounds er- 
ror at a particular point in the source code. We define negatives, therefore, as 
executions that not only end in the same error state, but that reach it from the 
same control location. Rather than analyzing all deadlocks, our definition focuses 
analysis on deadlocks that occur after the same attempt to acquire a lock, for 
example. We believe that our definition formally captures a simplified version of 
the programmer’s intuitive notion of “the same error.” Positives are then defined 
as executions that pass through that control location without proceeding to an 
error state. 

Error explanation is especially important in the context of model checking; 
in the event that model checking is applied to software implementations, one of 
two cases is likely to hold: the model checking is being done under the guidance 
of the designers and implementors of the program only after testing has exposed 
most of the less subtle bugs. Any remaining errors are likely to be quite complex 
and difficult to understand (since discovery of the rare but catastrophic failure is 
in some sense the motivation of model checking) . The other case in which model 
checking is applied to software implementations currently is that the verification 
is being performed by model checking experts who are not intimately familiar 
with the program being examined, and are relying on a high level specification 
of its behavior. In this event, even if simpler bugs are unveiled, understanding 
whether they are spurious or indeed involve violations of a correct part of the 
specification without an intuitive knowledge of the program can be quite difficult. 
In either case, automated analysis focusing attention on the most important 
parts of the error and highlighting the difference between failing and succeeding 
runs should be very useful. 

This paper is organized as follows: in section 2 we discuss related work. The 
definitions of negative and positive executions are then formalized in section 3, 
followed by a presentation of an algorithm for generating executions to analyze 



in section 4. The various analyses currently applied are discussed in section 5. We 
then present a larger case study and experimental results in section 7, followed 
by conclusions and future work. 


2 Related Work 

The most closely related work to ours is that of Ball, Naik, and Rajamani [1], 
They find successful paths to the control location at which an error is discovered 
in order to find the cause of the error. Once a cause is discovered, they model 
check a restricted model in which the system is restricted from executing the 
causal transitions to discover if other causes for the error are possible. This error 
analysis has been implemented for the SLAM [2] tool. 

Sharygina and Peled [13] propose the notion of the neighborhood of a coun- 
terexample and suggest that an exploration of this region may be useful in 
understanding an error. However, the exploration, while aided by a testing tool, 
is essentially manual and offers no automatic analysis. No formal notion of other 
versions of the same error is presented. Dodoo, Donovan, Lin and Ernst [5] use 
the Daikon invariant detector to discover differences in invariants between pass- 
ing and failing test cases, but propose no means to restrict the cases to similar 
executions relevant for analysis or to generate them automatically from a coun- 
terexample. 

Jin, Ravi and Somenzi [11] proceed from the same starting point of analyzing 
counterexamples produced by a model checker. Their goal is also similar: pro- 
viding additional feedback in addition to the original counterexample in order to 
deal with the complexity of errors. Fate and free will are terms in a concurrent 
reachability game in which a counterexample is broken into parts depending on 
whether the environment (attempting to force the system into an error state) 
or the system (attempting to avoid error) controls it. This is an alternative ap- 
proach to understanding errors, and produces a different kind of explanation (an 
alternation of fated and free segments) . 

The work of Andreas Zeller was also an important influence on this work. 
Delta debugging is a technique for minimizing error trails that works by con- 
ducting a modified binary search between a failing run and a succeeding run of a 
program [16]. Zeller has extended this notion to other approaches to automatic 
debugging, including modifying portions of a program’s state to isolate cause- 
effect chains [15] and discovering the minimal difference in thread scheduling 
necessary to produce a concurrency-based error [3] . Our computation of transfor- 
mations between positive and negative executions was inspired by this approach, 
particularly in that we look for minimal transformations. 

3 Definitions 

The crucial definitions are those of negatives and positives , the two classes of 
executions we use in our analysis. While manual exploration of paths near a 



counterexample can be useful [13], a formal definition of a variation on a coun- 
terexample is necessary before proceeding to the more fruitful approach of au- 
tomatic generation and analysis of relevant executions. Intuitively, we examine 
the full set of finite executions in which the program reaches the control location 
immediately proceeding the error state. 

A labeled transition system (LTS) is a 4-tuple (S, So, Act,T ) , where S is a 
finite non-empty set of states, So C S is the set of initial states, Act is the set 
of actions, and T C S x Act x S is the transition relation. We assume that 
S contains a distinguished set of error states (with no outgoing transitions), 
II = {7To, • • • ,7r„} (representing, e.g., deadlock, assertion violation, uncaught 
exception, etc.). In our model, we also introduce a set C of control locations and 
a set D of data valuations, such that S = (C x D) U II, and introduce partial 
projection functions c : S — > C and d : S D. We write s —A s' as shorthand 
for ( s , a, s') £ T. 

A finite transition sequence from so G S' is a sequence t = so —A sq —A 
• • • — A Sk, where 0 < k < oo. We refer to k as the length of t, also denoted 
by |f|. We say that a finite transition sequence t = so -^A si -Aq • • • —A s k 

is a prefix of a finite transition sequence t' = Sq — A s\ — A • • • —A s k ' if 
0 < k < k' and Vi < k . (i > 0 =>- s,; = s') A (i > 0 => ctj, = a\). We say that 
a finite transition sequence t = so — A sq — A • • • — A s& is a control suffix of 

a finite transition sequence t' = Sq -^A sj —A ■ ■ ■ —A s*< if 0 < k < k' and 
Vi < k . (i > 0 =>- c(sq. ,) = c(Sj.,_ f .)) A (i > 0 => a,: = o'). We also define 
the empty transition sequence, emp as consisting of no states or actions, where 
|emp| = 0. 

We consider the class of counterexamples that are finite transition sequences 
from so £ So. Given an initial counterexample t = so —A sq —A ■ ■ ■ -^A sq, 
where sq £ II. we define a negative as an execution that results in the same 
error state from the same control location (the original counterexample is itself 
a negative). Formally: 

Definition: Negative: A negative (with respect to a particular t, as noted 

above) is a finite transition sequence from s' 0 £ So, t' = Sq — A s[ — A • • • —A s ' k , , 
where 0 < k' < oo, such that: 

1. c(sq_i) = c(s[.. , ) A aq = a' k , and 

2. s k = s' k ,. 

We then define neg(t ) as the set of all negatives with respect to a counterex- 
ample t. The original counterexample itself is one such negative, and is used as 
such in all analyses. 

Definition: Positive: A positive (with respect to t) is a finite transition 

sequence from s' 0 £ S 0 , t' = Sq -Aq s[ —A ■ ■ ■ —A s' k , , where 0 < k 1 < oo such 
that: 


1. c(s k - 1 ) = c(4-_i) A aq = a' k , , 

2. s' k , II, and 



3. Vt" £ neg(t) . t' is not a prefix of t" . 


We define pos(t ) as the set of all positives with respect to a counterexam- 
ple t, and var(t) as neg(t ) U pos(t), the set of all variations on the original 
counterexample. We will henceforth refer to neg and pos, omitting the implied 
parameterization with respect to t. 



Negative 


Fig. 1 . A counterexample, a negative, and a positive. 


Figure 1 shows an example. The numbers inside states indicate the control 
location of the state, c(s), and the letters beside the arrows are the labels of ac- 
tions (in this case drawn from the alphabet {a,,b}). The original counterexample 
ends in the state .4 £ 77, indicating an assertion violation. The negative shown 
takes a different sequence of actions but also passes through the control location 
3, takes an a action, and transitions to the error state .4. The positive reaches 
control location 3 but in a data state such that taking an a action transitions to 
a non-error state. 

These basic definitions, however, give rise to certain difficulties in practice. 
First, the set of negatives is potentially infinite, as is the set of positives. On the 
other hand, the set of positives may be empty, as an error in a reactive system is 
often reachable from any other state. For reasons of tractability we generate and 
analyze subsets of the negatives and positives. When only a subset of negatives 
are known the third condition in the definition of positives cannot be checked; 
we therefore replace it with the weaker requirement that t' not be a prefix of 
any negative we generate. 

4 Generation of Positives and Negatives 

The algorithm for generating a subset of the negatives (and a set of potential 
positives, per the modified prefix condition) uses a model checker to explore 
backwards from the original counterexample. We describe an explicit state algo- 



rithm, but it seems evident that SAT based bounded model checking approaches 
would also be possible. 

We assume that the model checker ( MC ) can be called as a function during 
generation with an initial state s from which to begin exploration, a maximum 
search depth d, a control state to match c, an error state n, and a visited set 
v. The model checker returns two (possibly empty) sets: n (negatives) and p 
(potential positives) and a new visited set v' . The generation algorithm (Figure 
2) takes as input an initial counterexample t = so — b si ■ ■ ■ — A Sk and a 
search depth d. 


generate ( t , d ) 
v := 0 
neg : = 0 
pos := 0 
i := k — 1 
while i >= 0 

(n, p , v ) := MC(si , £, d, v ) 
neg := neg U n 
pos := pos U p 
i := i + 1 
for all t 6 pos 
for all t' 6 neg 

if t is a prefix of t' 
pos := pos \ t 
return (neg, pos) 


Fig. 2. Algorithm for generation of negatives and positives. 


The model checking algorithm used is not specified. If a depth limit is not 
given each call to the model checker will only terminate upon exploring the full 
reachable state space from s,;. In the case that a depth limit is used, we alter 
the behavior of the model checker. When the depth limit is reached, we attempt 
to extend the execution to match the original counterexample. This causes the 
depth limit to behave as an edit-distance from the original counterexample: 
negatives and positives may deviate from the original execution for a number of 
actions limited by d. The algorithm for extension, proceeding from a state s is 
given in Figure 3. Briefly, the algorithm checks the state at which exploration 
terminates due to depth limiting to see if it matches control location with any 
state further along the original counterexample. For all matches, the actions 
taken in the original counterexample are repeated if enabled in order to reach 
either a negative or a positive. 

We use neg and pos below to denote the sets returned by this generation 
algorithm, not the true complete sets of negatives and positives. 

5 Analysis of Variations 

Once the negatives and positives have been generated, it remains to produce 
from them useful feedback for the user. Even without such analysis, the traces 
may prove useful, but our experience shows that even tightly limited searches will 



j := i 
while j < k 

if c( Sj ) = c(s) 
s' := s 
l := j + 1 
broken := false 
while l < k A -> broken 

if 3 s" . s' — 4 s" A c(s ") = c(si ) A s" (jL v 
s' := s" 


j 


else 

broken : = true 
l := / + 1 
if -i broken 


if 


if s" e n 


add transition sequence to 
else 

add transition sequence to 

:= j + 1 


to current set of negatives 
to current set of positives 


Fig. 3. Algorithm for extension. 


produce large numbers of traces that are as difficult to understand in isolation 
as the original counterexample. It is not the traces in and of themselves that 
provide leverage in understanding the error; any negative could have generally 
been substituted for the original counterexample, and a positive simply shows 
an instance of the program reaching a control location without error. It is true 
that one use of the negatives is possible without further analysis: they can be 
added to regression tests so that it can be determined if a fix for the original 
counterexample covers all found versions of the original problem. 


5.1 Transition Analysis 

The various analyses we employ are designed to characterize (1) the common 
elements of negatives/positives and (2) the difference between negatives and pos- 
itives. For this analysis, we examine the presence of transitions in the executions 
in each set. In particular we compute sets containing projected transitions, pairs 
(c, a), where c £ C is a control location and a £ Act is an action. We say that 
the finite transition sequence t = sq — b si —A ... Thy Sk contains ( c,a ) iff 
3 n < k . c(s n ) = c A a n . |_i = a. The analysis below can also be computed using 
only projected control locations, ignoring actions (or also projecting on some 
portion of a composite action, when this is possible). 

In transition analysis, we compute a number of sets of transitions, listed in 
Table 1. trans(neg) and trans(pos) are complete sets of all transitions appearing 
in negatives and positives, respectively. The sets all(neg) and all(pos) (transi- 
tions appearing in all negatives or positives) are reported directly to the user. 
These may be sufficient to explain an error, either by indicating that certain code 
is faulty or that execution of certain code prevents the error from appearing. Also 
reported to the user are the transitions appearing only in negatives/positives, 
only(neg) and only(pos). Finally, if non-empty, the potentially causal transition 
sets are reported. 



Transition Analysis Set 

Definition 

trans(neg) 

(c,a)\3t 6 neg . t contains (c, a) 

trans(pos ) 

(c, <y)\3t 6 pos . t contains (c, a) 

all(neg) 

(c, a)\\/t 6 neg . t contains (c, a) 

all(pos ) 

(c, a)| Vt 6 pos . t contains (c, a) 

only(neg) 

trans(neg)\trans(pos ) 

only(pos) 

trans(pos)\trans(neg ) 

cause(neg) 

all(neg) fl only(neg) 

cause(pos ) 

all(pos ) fl only(pos ) 


Table 1 . Transition analysis set definitions. 


The rationale for computing causal sets is that in many cases all(neg) and 
all(pos) will contain a number of common elements, due to common initialization 
code and aspects of execution unrelated to the error, only(neg) and only(pos) 
may also be large sets if the error induces differing behavior in the system be- 
fore the point at which the error is detected. When non-empty, cause(neg ) and 
cause(pos) denote sets that are potentially much smaller and denote precisely 
the common behavior that differentiates the negative and positive sets. The 
error cause localization algorithm used in SLAM is comparable to reporting 
cause(neg), although it is based on transitions defined as pairs of projected con- 
trol locations and computation of all(neg) is unnecessary as their analysis only 


uses 

one negative at a time [1]. 



1 

int got. lock = 0; 

public static void 

lock () { 

2 

do { 

Verify . assertTrue 

(LOCK == 0); 

3 

if (Verify. randomBool ()) { 

LOCK = 1; 


4 

lock () ; 

} 


5 

got_lock++ ; 



6 

} 



7 

if (got_lock != 0) { 



8 

unlock () ; 

public static void 

unlock () { 

9 

} 

Verify . assertTrue (LOCK == 1) ; 

10 

got_lock — ; 

LOCK = 0; 


11 

} while (Verify .randomBool ()); 

} 



Fig. 4. Example #1. 


Example of Transition Analysis The Java code in Figure 4 (adapted from 
an example used by Henzinger, Jhala, Majumdar, and Sutre [10]) calls lock and 
unlock methods that assert that the lock is not held and the lock is held, re- 
spectively. Verify .randomBool () indicates a nondeterministic choice between 
true and false (see Section 6). The bug (line 10 should be inside the scope of 
the if starting at line 7) can appear as a violation of either the lock or unlock 
assertion. 

We begin error analysis from a counterexample in which the unlock assertion 
is violated: 1 — > 2 — > 3 -A 7 — >10 — > 11 -A 3 A 7 — > 8 — > A. We use 
a search depth of 30. 




Transition Analysis Set 

Elements 

all(neg) 

{1, 2, <3, F), 7, 8, 10, <11, T )} 

all{pos ) 

{1,2, (3,T),4, 5,7, 8} 

only(neg) 

{<3,F).10, <11,T>} 

only(pos ) 

0 

cause(neg) 

|(3,F),10,<11,T)} 

cause(pos ) 

0 


Table 2. Transition analysis example results. 


In this case cause(neg) is unchanged by our use of the weaker prefix con- 
straint for positives. Here cause{neg ) notes the key points of the unlocking er- 
ror: the system chooses not to lock ({3 ,F)), which means that the decrement 
of got_lock (10) is incorrect (the lock’s status has not been changed this time 
through the loop). If we reiterate the loop ((11, T)), it is now possible to try to 
unlock when the lock has not been acquired. 

5.2 Invariant Analysis 

Transition analysis is useful when the control flow or action choices independent 
of ordering are sufficient to explain an error. However, the same actions from the 
same control locations may be present in both negatives and positives; it may 
be that the choice of an action with respect to d(s ) rather than c(s) is crucial. 
A set-based approach projected on d(s) rather than c(s) faces the problem that 
only certain data values are likely to be relevant, rather than the full state. 

Instead, we compute data invariants over the negatives and compare them to 
the invariants over the positives. Specifically, the user may choose certain control 
locations as instrumentation points. The value of d(s) (or some projection over 
certain variables of the data state) is recorded for each transition sequence every 
time the control flow reaches the instrumentation locations. We then compute 
invariants using Daikon [6] (see section 6 for details) with respect to each of 
the instrumentation points over all negatives and all positives. The invariants 
for negatives are then compared to the invariants for positives, and the user is 
presented with this difference. 


Example of Invariant Analysis The code in Figure 5 is intended to sort 
the variables a, b, c and d in ascending order. The last line asserts that the 
variables are ordered. However, the comparisons are not sufficient to ensure 
ordering. Verify . instrumentPoint indicates a point at which d(s ) is recorded 
(and a name for that instrumentation point). Applying invariant analysis with 
a search depth of 30 yields the following differences (values after sorting, at the 
instrumentation point post-sort, are indicated by primed variable names): 

We observe from the negative invariants that a ’ may be greater than b ’ . 
Because invariant analysis is complete over the negative and positive runs, the 
absence of ana 1 <= c’ invariant for negatives also indicates that a’ is greater 
than c’ in at least one negative. Adding only the a, b comparison to the code 




int a = Verify. random (4) ; int b = Verify. random (4) ; // nondeterministic 0-4 
int c = Verify. random (4) ; int d = Verify. random (4) ; // nondeterministic 0-4 
int temp = 0; 

Verify. instrumentPoint ("pre-sort") ; 
if (a > b) { 

temp = b; b = a; a = temp; } // Swap 
if (b > c) { 

temp = c; c = b; b = temp; } // Swap 
if (c > d) { 

temp = d; d = c; c = temp; } // Swap 
if (b > c) { 

temp = c; c = b; b = temp; } // Swap 
Verify. instrumentPoint ("post-sort") ; 

Verify. assert True ( (a <= b) && (b <= c) && (c <= d) ) ; 

Fig. 5. Example #2. 


Instrumentation Point 

Positive Invariant 

Negative Invariant 

pre-sort 

a >= 0 

a >= 1 


b <= d 

a <= b 
a > c 
b > c 

post-sort 

a’ >= 0 

a 5 >= 1 


a * <= b 5 

a 5 > b 5 


a 5 <= c’ 



II 

V 

& 

b 5 < d’ 


|d* >= temp 

d* > temp 


Table 3. Invariant analysis example results. 


before again model checking and analyzing the resulting counterexample gives 
the remaining crucial invariant difference: b’ <= c’ (positive) vs. b ’ > c’ (neg- 
ative). Adding this comparison results in code that satisfies the sorting assertion. 


5.3 Transformation of Positives into Negatives 

Our final analysis is based on the intuition that when both negatives and posi- 
tives exist, we can imagine ‘‘breaking” a positive by changing the least number 
of actions required to produce a negative. If a positive and a negative follow the 
same path for a long sequence of states and actions, then diverge for a period 
before again rejoining paths, the difference in actions in the divergent section 
may give important insights into the cause of the error. Our extension algorithm 
(Figure 3) is intended to find such pairs of negatives and positives. 

We say that there is a transformation of a positive t = so — b sq 

su into a negative t' = Sq — 4 s[ — • • • — A s' k , when: 

1. 3 p . p is a finite transition sequence which is a prefix of both t and t! . 

2. 3 u . u is a finite transition sequence which is a control suffix of both the 
largest prefix of t and the largest prefix of t' . 

Note that as the final states of t and t' do not share a control location, we 
must take the largest prefixes of both in order to allow for the existence of u. 





Fig. 6. Transforming a positive into a negative. 


A minimal transformation from t to t' always exists when there is a transfor- 
mation from t tot,' . We define the minimal transformation as a 3-tuple { kt , t p , t n ) 
where 0 < k t < \t\ and t p and t n are either finite transition sequences or the 
empty transition sequence, emp. We may also write ( t p ) — > (t n ) when we are 
considering only the actual sequences replaced and not the location from which 
they begin (discarding k t allows us to see when the same alteration of actions 
from different positions causes an error in a number of positives) . 

1. Find the p such that p is the largest (maximizing |p|) finite transition se- 
quence which is a prefix of both t, and t' . 

2. Find the u such that u is the largest finite transition sequence which is a 
control suffix of both the largest prefix of t and the largest prefix of t' and u 
satisfies the constraint that |u| + \p\ < min(\t\, 

3. k t = \p\ 

4. t p = su t • If kt > k — |w| then t p = emp. 

5. t,„ = s' kt -^j- 1 2 3 4 5 • • • s&'_|u| If kt > k' — |u| then t, n = emp. 

When So contains a single state, there will exist a minimal transformation 
for every pair in pos x neg. Sorting this set by a metric of transformation size 
( \t p \ + \t, n \ is one reasonable choice, though this ignores similarities within the 
transformation) yields a description of increasingly complex ways to cause a 
successful execution to fail. This set (along with the associated positive (s) and 
negative (s) for each transformation) can aid understanding of aspects of an error 
(such as timing or threading issues) that are not expressible by either transition 



or invariant analysis. For example, if a positive can be transformed into a nega- 
tive by changing actions that represent thread/process scheduling choices only, 
an error can be immediately classified as a concurrency problem. Additionally, 
we reapply the transition analysis with the values of t p replacing pos and the 
values of t n replacing neg. Concentrating on the changes necessary to cause pos- 
itives to become negatives may yield causal transitions when none are discovered 
by the first analysis (because the context in which the transitions are executed 
is important - they are causal only under certain conditions). 

Returning to the example in Figure 4, running transformation analysis gives 
us two distinct minimal transformations: (3 — > 4 — > 5 — > 7) — > (3 — y 7 — > 

10 ^ 11 A 3 A 7) and (3 A 4 — > 5 — > 7) -> (3 A 7 — >10 — > 

11 3 4 — y 5 — y 7 — >10 — >11 ^>3-^7 — >8 — >10 — > 11-4 

3 — > 7). The first of these can be read as “the error will occur in this execution 
if, rather than choosing to acquire the lock (t p ), the system, in a state where 
get_lock == 0, decrements get_lock, then chooses to loop around and again 
chooses not to acquire the lock (t n ).” The second example produces the negative 
in which the lock is acquired once, so that it is only on the second iteration 
through the loop that get_lock’s value becomes incorrect with respect to the 
guard in line 7. 


6 Implementation 

We implemented our algorithm for generating and analyzing variations inside 
the Java PathFinder model checker [14]. Java PathFinder (JPF) is an explicit 
state on-the-fly model checker that takes compiled Java programs (i.e. bytecode 
class-files) and analyzes all paths through the program for deadlock, assertion 
violations and linear time temporal logic (17 IT.) properties. In this implemen- 
tation we only consider safety properties. We hope to consider the analysis of 
LTL counterexamples in future work. JPF is unusual in that it is built on a 
custom-made Java Virtual Machine (JVM) and therefore does not require any 
translation to an existing model checker’s input notation. Actions of an envi- 
ronment not under the control of the Java program are represented in JPF 
as nondeterministic choices, introduced with special Verify .randomBool () or 
Verify .random (int i) calls which are trapped by the model checker. For ex- 
ample, Verify .random (2) will nondeterministically return a value in the range 
0-2, inclusive. In terms of the LTS model used above, Act = (f x n), where 
t is a non-negative integer identifying the thread executing in the step, and 
n is either a non-negative integer indicating a nondeterministic choice result- 
ing from a Verify call (or -1, indicating no such call was made). 71 is the set 
{deadlock, assertion, exception} indicating that there is a deadlock, an assertion 
was violated, or that an uncaught exception was raised. States are the various 
states of the JVM (including states for each member of II). c{s) returns a set of 
control locations (bytecode positions), one for each thread in the current state, 
allowing for further projection of the control location along each thread. 



Our implementation of error explanation makes use of JPF’s various search 
capabilities to provide a wide range of possible searches during the generation 
of variations, including heuristic searches [8]. 

We have added the ability to produce Daikon [6] trace files to JPF. Daikon 
is a tool that takes trace files generated by instrumented code and discovers 
invariants over the set of traces. We use Daikon for invariant analysis. The other 
analysis techniques are implemented inside JPF. In JPF, all executions start 
from the same initial state of the JVM, so the full transformation set always 
exists. For transition analysis JPF allows various projections on actions, such 
as ignoring nondeterministic choice or selected thread, as well as analysis based 
only on control location. In the JPF implementation, we universally use, rather 
than the c(s) defined above, a projection that produces only the control location 
of the thread that is executed from a state ( c{s,a )). We believe this to be 
an improvement in any case where there are well-defined control locations for 
threads or processes in the LTS model. 


7 Case Study /Experimental Results 

We applied error explanation to determine the cause of the time-partitioning 
error in an early version of the DEOS real-time operating system used by Hon- 
eywell in small business aircraft. We have studied this system before [12] and 
at that time we didn’t know what the error was, only that there was an error 
in the system. When we found the error it took us hours to determine that the 
counterexample given was in fact non-spurious 1 , and, more precisely, the error 
we were looking for. Given this experience and the fact that the DEOS error is 
very subtle we believed this to be a good test of the error explanation approach. 

The DEOS system is written in C++ and is approximately 10000 lines of 
code — we worked with a 1500 line slice of the system that contains all the 
parts necessary to show the error. We also worked with a Java translation of the 
code in order to use the JPF model checker. DEOS is a real-time operating sys- 
tem based on rate-monotonic scheduling that allows user-threads to make kernel 
calls during their execution, for example, they can yield the CPU by making a 
WaitUntilNextPeriod call or remove themselves by making a Delete call. Fur- 
thermore, since threads can have different priority they can be interrupted by a 
higher priority thread when a SystemTick happens (indicating a new schedul- 
ing period starting), or, they can use up all their allotted time, indicated by a 
Timer Interrupt. The property we were checking was a safety property ensur- 
ing time-partitioning — a thread always gets the amount of time it asked for — 
checked by an assertion whenever a new thread is to be scheduled. 

JPF found the original error in 52 seconds (on a 2.2Ghz Pentium with 2GB 
of memory), and then spent another 102 seconds performing a depth-limit 30 
analysis to explain the error (finding 131 variations on the error in the process). 
The resulting output indicated the following key points: 

1 We abstracted the system by replacing real-time by our own virtual time, hence we 
were getting spurious errors from time to time. 



— The Delete call is present in all negatives. 

— The shortest transformations from positive runs to negatives are: 

• replacing a WaitUntilNextPeriod with a Delete call; 

• inserting a Timer Interrupt and a SystemTick before a Delete call. 


This shows that the Delete call is essential to the error, but only in specific 
circumstances. This matches the cause of the known error, where a Delete call 
is performed after a specific amount of time has elapsed (the variable indicating 
that time has passed and should be subtracted from a thread’s budget is not 
properly handled during deletion). Note that making a Delete call by itself is 
not sufficient to cause the error, since there are positives containing this call. 
It took approximately 15 minutes to analyze the output file produced from the 
error explanation to determine the cause. 

We also applied error analysis to concurrency errors such as those in the 
Remote Agent [9]. Transformation analysis indicates when an error can be in- 
duced by only changing thread scheduling, and shows the minimal changes in 
scheduling necessary to induce the error in previously successful runs. 


8 Conclusions and Future Work 

We propose definitions for two kinds of variations on a counterexample discov- 
ered during model checking and present an algorithm for generating a subset of 
these variations. These successful and failing executions are then used by various 
analysis routines to provide users with a variety of indications as to the impor- 
tant aspects of the original counterexample. The analyses suggested provide 
feedback on (1) control locations and actions key to the error (2) data invariant 
differences key to the error and (3) means of transforming successful executions 
into counterexamples. While further experimental validation is needed, our re- 
sults demonstrate that this analysis can be very useful in understanding complex 
errors. 

The most important area of further research should be improving the meth- 
ods of analysis both to provide more useful feedback and to do more automatic 
classification of errors. While the goal of routinely reporting “change line i in the 
following manner” is unlikely ever to be reached, we believe that better methods 
than the rudimentary ones presented here may exist. In particular, automatic 
analysis of the transformations between positives and negatives should be taken 
a step further than merely noting concurrency-only differences. Another possi- 
bility is to generate from the negatives an automaton for an environment that 
avoids reproducing the error as in the work of Giannakopoulou, Pasareanu, and 
Barringer [7]. It is possible that in some instances such an assumption might 
succinctly characterize the error, although as an assumption it would only be an 
approximation of the most general environment for the program. 
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